Multi-Word Unit Dependency Forest-based Translation Rule Extraction

نویسندگان

  • Hwidong Na
  • Jong-Hyeok Lee
چکیده

Translation requires non-isomorphic transformation from the source to the target. However, non-isomorphism can be reduced by learning multi-word units (MWUs). We present a novel way of representating sentence structure based on MWUs, which are not necessarily continuous word sequences. Our proposed method builds a simpler structure of MWUs than words using words as vertices of a dependency structure. Unlike previous studies, we collect many alternative structures in a packed forest. As an application of our proposed method, we extract translation rules in form of a source MWU-forest to the target string, and verify the rule coverage empirically. As a consequence, we improve the rule coverage compare to a previous work, while retaining the linear asymptotic complexity.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Multi-task Learning for Word Alignment and Dependency Parsing

Word alignment and parsing are two important components for syntax based machine translation. The inconsistent models for alignment and parsing caused problems during translation pair extraction. In this paper, we do word alignment and dependency parsing in a multi-task learning framework, in which word alignment and dependency parsing are consistent and assisted with each other. Our experiment...

متن کامل

Better Filtration and Augmentation for Hierarchical Phrase-Based Translation Rules

This paper presents a novel filtration criterion to restrict the rule extraction for the hierarchical phrase-based translation model, where a bilingual but relaxed wellformed dependency restriction is used to filter out bad rules. Furthermore, a new feature which describes the regularity that the source/target dependency edge triggers the target/source word is also proposed. Experimental result...

متن کامل

A Hybrid Machine Translation System Based on a Monotone Decoder

In this paper, a hybrid Machine Translation (MT) system is proposed by combining the result of a rule-based machine translation (RBMT) system with a statistical approach. The RBMT uses a set of linguistic rules for translation, which leads to better translation results in terms of word ordering and syntactic structure. On the other hand, SMT works better in lexical choice. Therefore, in our sys...

متن کامل

Feature extraction in opinion mining through Persian reviews

Opinion mining deals with an analysis of user reviews for extracting their opinions, sentiments and demands in a specific area, which can play an important role in making major decisions in such area. In general, opinion mining extracts user reviews at three levels of document, sentence and feature. Opinion mining at the feature level is taken into consideration more than the other two levels d...

متن کامل

The Induction and Evaluation of Word Order Rules using Corpora based on the Two Concepts of Topological Models

Using dependency trees in natural language generation and machine translation raise the need to derive the word order from dependency trees. This task is difficult for languages with (partly) free word order and comparatively easier for languages with fixed word order. This paper describe (a) the two basic elements of topological models, (b) rule patterns for the mapping of dependency trees to ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011